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ABSTRACT 

Recommender systems improve access to relevant products 
and information by making personalized suggestions based 
on previous examples of a user's likes and dislikes. Most ex- 
isting recommender systems use social filtering methods that 
base recommendations on other users' preferences. By con- 
trast, content-based methods use information about an item 
itself to make suggestions. This approach has the advantage 
of being able to recommended previously unrated items to 
users with unique interests and to provide explanations for 
its recommendations. We describe a content-based book rec- 
ommending system that utiUzes information extraction and 
a machine-learning algorithm for text categorization. Initial 
experimental results demonstrate that this approach can pro- 
duce accurate recommendations. 

KEYWORDS: Recommender systems, information filtering, 
machine learning, text categorization 

INTRODUCTION 

There is a growing interest in recommender systems that sug- 
gest music, films, books, and other products and services to 
users based on examples of their likes and dislikes [|l9, 26 
[ll[ |. A number of successful startup companies like Fire- 
fly, Net Perceptions, and LikeMinds have formed to provide 
recommending technology. On-line book stores like Ama- 
zon and BarnesAndNoble have popular recommendation ser- 
vices, and many libraries have a long history of providing 
reader's advisory services [g, |Tj|. Such services are im- 
portant since readers' preferences are often complex and not 
readily reduced to keywords or standard subject categories, 
but rather best illustrated by example. Digital libraries should 



be able to build on this tradition of assisting readers by pro- 
viding cost-effective, informed, and personalized automated 
recommendations for their patrons. 

Existing recommender systems almost exclusively utilize a 
form of computerized matchmaking called collaborative or 
social filtering. The system maintains a database of the pref- 
erences of individual users, finds other users whose known 
preferences correlate significantly with a given patron, and 
recommends to a person other items enjoyed by their matched 
patrons. This approach assumes that a given user's tastes are 
generally the same as another user of the system and that a 
sufficient number of user ratings are available. Items that 
have not been rated by a sufficient number of users cannot 
be effectively recommended. Unfortunately, statistics on li- 
brary use indicate that most books are utilized by very few 
patrons ||l2||. Therefore, collaborative approaches naturally 
tend to recommend popular titles, perpetuating homogene- 
ity in reading choices. Also, since significant information 
about other users is required to make recommendations, this 
approach raises concerns about privacy and access to propri- 
etary customer data. 

Learning individualized profiles from descriptions of exam- 
ples {content-based recommending [^), on the other hand, 
allows a system to uniquely characterize each patron with- 
out having to match their interests to someone else's. Items 
are recommended based on information about the item itself 
rather than on the preferences of other users. This also allows 
for the possibility of providing explanations that list content 
features that caused an item to be recommended; potentially 
giving readers confidence in the system's recommendations 
and insight into their own preferences. Finally, a content- 
based approach can allow users to provide initial subject in- 
formation to aid the system. 

Machine learning for text-categorization has been applied to 
content-based recommending of web pages [ PSQ and news- 
group messages [|l5|; however, to our knowledge has not 
previously been applied to book recommending. We have 



been exploring content-based book recommending by apply- 
ing automated text-categorization methods to semi-structured 
text extracted from the web. Our current prototype system. 
Libra (Learning Intelligent Book Recommending Agent), 
uses a database of book information extracted from web pages 
at Amazon.com. Users provide 1-10 ratings for a selected set 
of training books; the system then learns a profile of the user 
using a Bayesian learning algorithm and produces a ranked 
list of the most recommended additional titles from the sys- 
tem's catalog. 

As evidence for the promise of this approach, we present ini- 
tial experimental results on several data sets of books ran- 
domly selected from particular genres such as mystery, sci- 
ence, literary fiction, and science fiction and rated by differ- 
ent users. We use standard experimental methodology from 
machine learning and present results for several evaluation 
metrics on independent test data including rank correlation 
coefficient and average rating of top-ranked books. 

The remainder of the paper is organized as follows. Section 
2 provides an overview of the system including the algorithm 
used to learn user profiles. Section 3 presents results of our 
initial experimental evaluation of the system. Section 4 dis- 
cusses topics for further research, and section 5 presents our 
conclusions on the advantages and promise of content-based 
book recommending. 

SYSTEM DESCRIPTION 

Extracting Information and Building a Database 

First, an Amazon subject search is performed to obtain a 
list of book-description URL's of broadly relevant titles. LI- 
BRA then downloads each of these pages and uses a simple 
pattern-based information-extraction system to extract data 
about each title. Information extraction (IE) is the task of lo- 
cating specific pieces of information from a document, thereby 



obtaining useful structured data from unstructured text |16 



g|. Specifically, it involves finding a set of substrings from 
the document, called fillers, for each of a set of specified 
slots. When applied to web pages instead of natural language 
text, such an extractor is sometimes called a wrapper IM]. 
The current slots utilized by the recommender are: title, au- 
thors, synopses, published reviews, customer comments, re- 
lated authors, related titles, and subject terms. Amazon pro- 
duces the information about related authors and titles using 
collaborative methods; however. Libra simply treats them 
as additional content about the book. Only books that have at 
least one synopsis, review or customer comment are retained 
as having adequate content information. A number of other 
slots are also extracted (e.g. publisher, date, ISBN, price, 
etc.) but are currently not used by the recommender We 
have initially assembled databases for literary fiction (3,061 
titles), science fiction (3,813 titles), mystery (7,285 titles), 
and science (6,177 titles). 

Since the layout of Amazon's automatically generated pages 
is quite regular, a fairly simple extraction system is suffi- 



cient. Libra's extractor employs a simple pattern matcher 
that uses pre-fiUer, filler, and post-filler patterns for each slot, 
as described by [ra]. In other applications, more sophisticated 
information extraction methods and inductive learning of ex- 
traction rules might be useful [pp. 

The text in each slot is then processed into an unordered bag 
of words (tokens) and the examples represented as a vector 
of bags of words (one bag for each slot). A book's title and 
authors are also added to its own related-title and related- 
author slots, since a book is obviously "related" to itself, and 
this allows overlap in these slots with books listed as related 
to it. Some minor additions include the removal of a small list 
of stop-words, the preprocessing of author names into unique 
tokens of the form first-initial_last-name and the grouping of 
the words associated with synopses, published reviews, and 
customer comments all into one bag (called "words"). 

Learning a Profile 

Next, the user selects and rates a set of training books. By 
searching for particular authors or titles, the user can avoid 
scanning the entire database or picking selections at random. 
The user is asked to provide a discrete 1-10 rating for each 
selected title. 

The inductive learner currently employed by Libra is a bag- 
of-words naive Bayesian text classifier [22] extended to han- 
dle a vector of bags rather than a single bag. Recent experi- 
mental results [ p^ pO| ] indicate that this relatively simple ap- 
proach to text categorization performs as well or better than 
many competing methods. Libra does not attempt to pre- 
dict the exact numerical rating of a title, but rather just a total 
ordering (ranking) of titles in order of preference. This task 
is then recast as a probabilistic binary categorization prob- 
lem of predicting the probability that a book would be rated 
as positive rather than negative, where a user rating of 1-5 
is interpreted as negative and 6-10 as positive. As described 
below, the exact numerical ratings of the training examples 
are used to weight the training examples when estimating the 
parameters of the model. 



Specifically, we employ a multinomial text model [20|, in 
which a document is modeled as an ordered sequence of 
word events drawn from the same vocabulary, V. The "naive 
Bayes" assumption states that the probability of each word 
event is dependent on the document class but independent of 
the word's context and position. For each class, Cj, and word 
or token, Wk £ V, the probabilities, P{cj) and P{wk\cj) 
must be estimated from the training data. Then the posterior 
probability of each class given a document, D, is computed 
using Bayes rule: 



Picj\D) ^ ^^f[P{a.\c,) 



(1) 



where a, is the ith word in the document, and \D\ is the 
length of the document in words. Since for any given docu- 



ment, the prior P{D) is a constant, this factor can be ignored 
if all that is desired is a ranking rather than a probability es- 
timate. A ranking is produced by sorting documents by their 
odds ratio, P{ci\D) / P{co\D), where Ci represents the pos- 
itive class and co represents the negative class. An example 
is classified as positive if the odds are greater than 1, and 
negative otherwise. 

In our case, since books are represented as a vector of "doc- 
uments," djn, one for each slot (where s™ denotes the mth 
slot), the probability of each word given the category and the 
slot, P{wk\cj , s„i), must be estimated and the posterior cat- 
egory probabilities for a book, B, computed using: 



Slot 



Word 



Strength 



Pic,) 



S \d„ 



P(.c,\B) = -^ n n Piam^\cJ,s^) (2) 

'^ ' m=l i=l 

where S is the number of slots and a„^i is the ith word in the 
TOth slot. 

Parameters are estimated from the training examples as fol- 
lows. Each of the N training books, B^ (1 < e < N) is given 
two real weights, < a^j < 1, based on scaling it's user rat- 
ing, r (1 < r < 10) : a positive weight, aei = (r — 1)/9, and 
a negative weight a^Q = 1 — aei- If a word appears n times 
in an example B^, it is counted as occurring aein times in a 
positive example and aeon times in a negative example. The 
model parameters are therefore estimated as follows: 



N 



Pic,)=J2^ej/N 



N 



P{wk\Cj,Sm) = y]aejnkem/L{Cj,Sm) 






(3) 



(4) 



where rikem is the count of the number of times word Wk 
appears in example Be in slot s,,,, and 



^\Cj, Sm) 



N 

E 



^ej l^r? 



(5) 



denotes the total weighted length of the documents in cate- 
gory Cj and slot «„. 

These parameters are "smoothed" using Laplace estimates to 
avoid zero probability estimates for words that do not ap- 
pear in the hmited training sample by redistributing some of 
the probability mass to these items using the method recom- 
mended in [pj[|. Finally, calculation with logarithms of prob- 
abilities is used to avoid underflow. 

The computational complexity of the resulting training (test- 
ing) algorithm is linear in the size of the training (testing) 
data. Empirically, the system is quite efficient. In the exper- 
iments on the LiTl data described below, the current Lisp 
implementation running on a Sun Ultra 1 trained on 20 ex- 
amples in an average of 0.4 seconds and on 840 examples in 



WORDS 


ZUBRIN 


9.85 


WORDS 


SMOLIN 


9.39 


WORDS 


TREFIL 


8.77 


WORDS 


DOT 


8.67 


SUBJECTS 


COMPARATIVE 


8.39 


AUTHOR 


D.GOLDSMITH 


8.04 


WORDS 


ALH 


7.97 


WORDS 


MANNED 


7.97 


RELATED-TITLES 


SETTLE 


7.91 


RELATED-TITLES 


CASE 


7.91 


AUTHOR 


R.ZUBRIN 


7.63 


AUTHOR 


R.WAGNER 


7.63 


AUTHOR 


H.MORAVEC 


7.63 


RELATED-AUTHORS 


B.DIGREGORIO 


7.63 


RELATED-AUTHORS 


A.RADFORD 


7.63 


WORDS 


LEE 


7.57 


WORDS 


MORAVEC 


7.57 


WORDS 


WAGNER 


7.57 


RELATED-TITLES 


CONNECTIONIST 


7.51 


RELATED-TITLES 


BELOW 


7.51 



Table 1 : Sample Positive Profile Features 



an average of ILS seconds, and probabilistically categorized 
new test examples at an average rate of about 200 books per 
second. An optimized implementation could no doubt sig- 
nificantly improve performance even further. 

A profile can be partially illustrated by listing the features 
most indicative of a positive or negative rating. Table [l| presents 
the top 20 features for a sample profile learned for recom- 
mending science books. Strength measures how much more 
likely a word in a slot is to appear in a positively rated book 
than a negatively rated one, computed as: 

Strength{wk,Sj) = log{P[wk\ci,Sj) / P{wk\c(i,Sj)) (6) 



Producing, Explaining, and Revising Recommendations 

Once a profile is learned, it is used to predict the preferred 
ranking of the remaining books based on posterior probabil- 
ity of a positive categorization, and the top-scoring recom- 
mendations are presented to the user 

The system also has a limited ability to "explain" its rec- 
ommendations by Usting the features that most contributed 
to its high rank. For example, given the profile illustrated 
above. Libra presented the explanation shown in Table |2[ 
The strength of a cue in this case is multiplied by the num- 
ber of times it appears in the description in order to fully 
indicate its influence on the ranking. The positiveness of a 
feature can in turn be explained by listing the user's training 
examples that most influenced its strength, as illustrated in 
Table g where "Count" gives the number of times the feature 
appeared in the description of the rated book. 

After reviewing the recommendations (and perhaps disrec- 
ommendations), the user may assign their own rating to ex- 
amples they believe to be incorrectly ranked and retrain the 



The Fabric of Reality: 



ence of Parallel Universes- And Its Implications 


y David Deutsch recommended because: 


Slot 


Word 


Strength 


WORDS 


MULTIVERSE 


75.12 


WORDS 


UNIVERSES 


25.08 


WORDS 


REALITY 


22.96 


WORDS 


UNIVERSE 


15.55 


WORDS 


QUANTUM 


14.54 


WORDS 


INTELLECT 


13.86 


WORDS 


OKAY 


13.75 


WORDS 


RESERVATIONS 


11.56 


WORDS 


DENIES 


11.56 


WORDS 


EVOLUTION 


11.02 


WORDS 


WORLDS 


10.10 


WORDS 


SMOLIN 


9.39 


WORDS 


ONE 


8.50 


WORDS 


IDEAS 


8.35 


WORDS 


THEORY 


8.28 


WORDS 


IDEA 


6.96 


SUBJECTS 


REALITY 


6.78 


TITLE 


PARALLEL 


6.76 


WORDS 


IMPLY 


6.47 


WORDS 


GENIUSES 


6.47 






15 


8 


7 





3 


9 


3 


9 


2 



Table 2: Sample Recommendation Explanation 

The word UNIVERSES is positive due to your ratings: 

Title Rating Count 

The Life of the Cosmos 

Before the Beginning : Our Universe and Others 

Unveiling the Edge of Time 

Black Holes : A Traveler's Guide 

The Inflationary Universe 

Table 3: Sample Feature Explanation 



system to produce improved recommendations. As with rel- 
evance feedback in information retrieval [|7p, this cycle can 
be repeated several times in order to produce the best results. 
Also, as new examples are provided, the system can track any 
change in a user's preferences and alter its recommendations 
based on the additional information. 

EXPERIMENTAL RESULTS 
Methodology 

Data Collection Several data sets were assembled to eval- 
uate Libra. The first two were based on the first 3,061 
adequate-information titles (books with at least one abstract, 
review, or customer comment) returned for the subject search 
"literature fiction." Two separate sets were randomly selected 
from this dataset, one with 936 books and one with 935, and 
rated by two different users. These sets will be called LiTl 
and Lit2, respectively. The remaining sets were based on 
all of the adequate-information Amazon titles for "mystery" 
(7,285 titles), "science" (6,177 titles), and "science fiction" 
(3,813 titles). From each of these sets, 500 titles were chosen 
at random and rated by a user (the same user rated both the 
science and science fiction books). These sets will be called 



Data 


Number Exs 


Avg. Rating 


% Positive (r 


>5) 


LiTl 


936 


4.19 


36.3 




Lit2 


935 


4.53 


41.2 




Myst 


500 


7.00 


74.4 




Sci 


500 


4.15 


31.2 




SF 


500 


3.83 


20.0 





Table 4: Data Information 













Rating 










Data 


1 


2 


3 


4 


5 6 


7 


8 


9 


10 


LiTl 


271 


78 


67 


74 


106 125 


83 


70 


40 


22 


Lit2 


272 


58 


72 


92 


56 75 


104 


87 


77 


42 


Myst 


73 


11 


7 


8 


29 46 


45 


64 


66 


151 


SCI 


88 


94 


62 


49 


51 53 


35 


31 


16 


21 


SF 


56 


119 


75 


83 


67 33 


28 


21 


12 


6 



Table 5: Data Rating Distributions 



Myst, Sci, and SF, respectively. 

In order to present a quantitative picture of performance on 
a realistic sample; books to be rated where selected at ran- 
dom. However, this means that many books may not have 
been familiar to the user, in which case, the user was asked 
to supply a rating based on reviewing the Amazon page de- 
scribing the book. Table Q presents some statistics about the 
data and Table g presents the number of books in each rating 
category. Note that overall the data sets have quite different 
ratings distributions. 

Performance Evaluation To test the system, we performed 
10-fold cross-validation, in which each data set is randomly 
split into 10 equal-size segments and results are averaged 
over 10 trials, each time leaving a separate segment out for 
independent testing, and training the system on the remain- 
ing data [|2|]. In order to observe performance given varying 
amounts of training data, learning curves were generated by 
testing the system after training on increasing subsets of the 
overall training data. A number of metrics were used to mea- 
sure performance on the novel test data, including: 

• Classification accuracy (Ace): The percentage of exam- 
ples correctly classified as positive or negative. 

• Recall (Rec): The percentage of positive examples classi- 
fied as positive. 

• Precision (Pr): The percentage of examples classified as 
positive which are positive. 

• Precision at Top 3 (Pr3): The percentage of the 3 top ranked 
examples which are positive. 

• Precision at Top 10 (PrlO): The percentage of the 10 top 
ranked examples which are positive. 

• F-Measure (F): A weighted average of precision and recall 
frequently used in information retrieval: 

F ={2- Pr- Rec)/{Pr + Rec) 



Data 


N 


Ace 


Rec 


Pr 


Pr3 


Frio 


F 


Rt3 


RtlO 


rs 


LiTl 


5 


63.5 


49.0 


50.3 


63.3 


62.0 


46.5 


5.87 


6.02 


0.31 


LiTl 


10 


65.5 


51.3 


53.3 


86.7 


76.0 


49.7 


6.63 


6.65 


0.35 


LiTl 


20 


73.4 


64.8 


62.6 


86.7 


81.0 


62.6 


7.53 


7.20 


0.62 


LiTl 


40 


73.9 


65.1 


63.6 


86.7 


81.0 


63.4 


7.40 


7.32 


0.64 


LiTl 


100 


79.0 


70.7 


71.1 


96.7 


86.0 


70.5 


8.03 


7.44 


0.69 


LiTl 


840 


79.8 


62.8 


75.9 


96.7 


94.0 


68.5 


8.57 


8.03 


0.74 


Lit2 


5 


59.0 


57.6 


52.4 


70.0 


74.0 


53.3 


6.80 


6.82 


0.31 


Lit2 


10 


65.0 


64.5 


56.7 


80.0 


82.0 


59.2 


7.33 


7.33 


0.48 


Lit2 


20 


69.5 


67.2 


63.2 


93.3 


91.0 


64.1 


8.20 


7.84 


0.59 


Lit2 


40 


74.3 


72.1 


68.9 


93.3 


91.0 


69.0 


8.53 


7.94 


0.69 


Lit2 


100 


78.0 


78.5 


71.2 


96.7 


94.0 


74.4 


8.77 


8.22 


0.72 


Lit2 


840 


80.2 


71.9 


78.6 


100.0 


97.0 


74.8 


9.13 


8.48 


0.77 


Myst 


5 


73.2 


83.4 


82.1 


86.7 


89.0 


81.5 


8.20 


8.40 


0.36 


Myst 


10 


75.6 


87.9 


82.4 


90.0 


90.0 


83.8 


8.40 


8.34 


0.40 


Myst 


20 


81.6 


89.3 


86.4 


96.7 


91.0 


87.3 


8.23 


8.43 


0.46 


Myst 


40 


85.2 


95.4 


85.9 


96.7 


94.0 


90.3 


8.37 


8.52 


0.50 


Myst 


100 


86.6 


95.2 


87.2 


93.3 


94.0 


90.9 


8.70 


8.69 


0.55 


Myst 


450 


85.8 


93.2 


88.1 


96.7 


98.0 


90.5 


8.90 


8.97 


0.61 


Sci 


5 


62.8 


63.8 


46.3 


73.3 


60.0 


51.1 


6.97 


6.17 


0.35 


Sci 


10 


67.6 


61.9 


51.2 


80.0 


67.0 


54.3 


7.30 


6.32 


0.37 


Sci 


20 


75.4 


66.0 


64.2 


96.7 


80.0 


63.1 


8.37 


7.03 


0.51 


Sci 


40 


79.6 


69.5 


68.7 


93.3 


80.0 


68.3 


8.43 


7.23 


0.59 


Sci 


100 


81.8 


74.4 


72.2 


93.3 


83.0 


72.3 


8.50 


7.29 


0.65 


Sci 


450 


85.2 


79.1 


76.8 


93.3 


89.0 


77.2 


8.57 


7.71 


0.71 


SF 


5 


67.0 


38.3 


32.9 


40.0 


29.0 


28.2 


5.23 


4.34 


0.02 


SF 


10 


64.6 


49.0 


28.9 


53.3 


36.0 


31.5 


5.83 


4.72 


0.15 


SF 


20 


71.8 


45.8 


37.4 


66.7 


37.0 


37.8 


6.23 


5.04 


0.21 


SF 


40 


72.6 


58.9 


40.1 


70.0 


43.0 


43.0 


6.47 


5.26 


0.39 


SF 


100 


76.4 


65.7 


46.2 


80.0 


56.0 


52.4 


7.00 


5.75 


0.40 


SF 


450 


79.2 


82.2 


49.1 


90.0 


63.0 


60.6 


7.70 


6.26 


0.61 



Table 6: Summary of Results 



• Rating of Top 3 (Rt3): The average user rating assigned to 
the 3 top ranked examples. 

• Rating of Top 10 (RtlO): The average user rating assigned 
to the 10 top ranked examples. 

• Rank Correlation (rs): Spearman's rank correlation coef- 
ficient between the system's ranking and that imposed by the 
users ratings (—1 < rg < 1); ties are handled using the 
method recommended by [|l]] . 

The top 3 and top 10 metrics are given since many users will 
be primarily interested in getting a few top-ranked recom- 
mendations. Rank correlation gives a good overall picture of 
how the system's continuous ranking of books agrees with 
the user's, without requiring that the system actually predict 
the numerical rating score assigned by the user. A correlation 
coefficient of 0.3 to 0.6 is generally considered "moderate" 
and above 0.6 is considered "strong." 

Basic Results 

The results are summarized in Table ^, where N represents 
the number of training examples utilized and results are shown 



for a number of representative points along the learning curve. 
Overall, the results are quite encouraging even when the sys- 
tem is given relatively small training sets. The SF data set is 
clearly the most difficult since there are very few highly-rated 
books. 

The "top n" metrics are perhaps the most relevant to many 
users. Consider precision at top 3, which is fairly consis- 
tently in the 90% range after only 20 training examples (the 
exceptions are LiTl until 70 examples^ and SF until 450 
examples). Therefore, Libra's top recommendations are 
highly likely to be viewed positively by the user. Note that 
the "% Positive" column in Table H gives the probability that 
a randomly chosen example from a given data set will be 
positively rated. Therefore, for every data set, the top 3 and 
top 10 recommendations are always substantially more likely 
than random to be rated positively, even after only 5 training 
examples. 



^References to performance at 70 and 300 examnlps are based on learn- 
ing curve data not included in the summary in Table r 




100 200 300 400 500 600 700 800 900 

Training Examples 

Figure 1: LiTl Rank Correlation 




50 100 150 200 250 300 350 400 450 

Training Examples 



Figure 2: Myst Precision at Top 10 



Considering the average rating of the top 3 recommenda- 
tions, it is fairly consistently above an 8 after only 20 training 
examples (the exceptions again are LiTl until 100 examples 
and SF). For every data set, the top 3 and top 10 recommen- 
dations are always rated substantially higher than a randomly 
selected example (cf. the average rating from Table ^. 

Looking at the rank correlation, except for SF, there is at 
least a moderate correlation (r^ > 0.3) after only 10 exam- 
ples, and SF exhibits a moderate correlation after 40 exam- 
ples. This becomes a strong correlation (r^ > 0.6) for LiTl 
after only 20 examples, for Lit2 after 40 examples, for Sci 
after 70 examples, for Myst after 300 examples, and for SF 
after 450 examples. 

Results on the Role of Collaborative Content 

Since collaborative and content-based approaches to recom- 
mending have somewhat complementary strengths and weak- 
nesses, an interesting question that has already attracted some 
initial attention [g, Q] is whether they can be combined to 
produce even better results. Since Libra exploits content 
about related authors and titles that Amazon produces using 
collaborative methods, an interesting question is whether this 
collaborative content actually helps its performance. To ex- 
amine this issue, we conducted an "ablation" study in which 
the slots for related authors and related titles were removed 
from Libra's representation of book content. The resulting 
system, called LiBRA-NR, was compared to the original one 
using the same 10-fold training and test sets. The statisti- 
cal significance of any differences in performance between 
the two systems was evaluated using a 1 -tailed paired /-test 
requiring a significance level of p < 0.05. 

Overall, the results indicate that the use of collaborative con- 
tent has a significant positive effect. Figures |l|, ^, and 
|[ show sample learning curves for different important met- 
rics for a few data sets. For the LiTl rank-correlation re- 
sults shown in Figure U there is a consistent, statistically- 
significant difference in performance from 20 examples on- 
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ward. For the Myst results on precision at top 10 shown in 
Figure g, there is a consistent, statistically-significant differ- 
ence in performance from 40 examples onward. For the SF 
results on average rating of the top 3, there is a statistically- 
significant difference at 10, 100, 150, 200, and 450 examples. 
The results shown are some of the most consistent differ- 
ences for each of these metrics; however, all of the datasets 
demonstrate some significant advantage of using collabora- 
tive content according to one or more metrics. Therefore, in- 
formation obtained from collaborative methods can be used 
to improve content-based recommending, even when the ac- 
tual user data underlying the collaborative method is unavail- 
able due to privacy or proprietary concerns. 

FUTURE WORK 

We are currently developing a web-based interface so that 
Libra can be experimentally evaluated in practical use with 
a larger body of users. We plan to conduct a study in which 
each user selects their own training examples, obtains recom- 
mendations, and provides final informed ratings after reading 
one or more selected books. 



Another planned experiment is comparing Libra's content- 
based approach to a standard collaborative method. Given 
the constrained interfaces provided by existing on-line rec- 
ommenders, and the inaccessibility of the underlying propri- 
etary user data, conducting a controlled experiment using the 
exact same training examples and book databases is difficult. 
However, users could be allowed to use both systems and 
evaluate and compare their final recommendations. R 

Since many users are reluctant to rate large number of train- 
ing examples, various machine-learning techniques for max- 
imizing the utility of small training sets should be utilized. 
One approach is to use unsupervised learning over unrated 
book descriptions to improve supervised learning from a smaller 
number of rated examples. A successful method for doing 
this in text categorization is presented in |23|. Another ap- 
proach is active learning, in which examples are acquired 
incrementally and the system attempts to use what it has al- 
ready learned to limit training by selecting only the most 
informative new examples for the user to rate [||. Specific 
techniques for applying this to text categorization have been 
developed and shown to significantly reduce the quantity of 
labeled examples required Jit] , p^ . 

A slightly different approach is to advise users on easy and 
productive strategies for selecting good training examples 
themselves. We have found that one effective approach is to 
first provide a small number of highly rated examples (which 
are presumably easy for users to generate), running the sys- 
tem to generate initial recommendations, reviewing the top 
recommendations for obviously bad items, providing low rat- 
ings for these examples, and retraining the system to obtain 
new recommendations. We intend to conduct experiments on 
the existing data sets evaluating such strategies for selecting 
training examples. 

Studying additional ways of combining content-based and 
collaborative recommending is particularly important. The 
use of collaborative content in Libra was found to be use- 
ful, and if significant data bases of both user ratings and item 
content are available, both of these sources of information 
could contribute to better recommendations y, Q]. One ad- 
ditional approach is to automatically add the related books 
of each rated book as additional training examples with the 
same (or similar) rating, thereby using collaborative informa- 
tion to expand the training examples available for content- 
based recommending. 

A list of additional topics for investigation include the fol- 
lowing. 

• Allowing a user to initially provide keywords that are of 
known interest (or disinterest), and incorporating this infor- 
mation into learned profiles by biasing the parameter esti- 



^ Amazon has already made significantly more income from the first au- 
thor based on recommendations provided by LIBRA than those provided by 
its own recommender system; however, this is hardly a rigorous, unbiased 
comparison. 



mates for these words [ [24[ | . 

• Comparing different text-categorization algorithms: In ad- 
dition to more sophisticated Bayesian methods, neural-network 
and case-based methods could be explored. 

• Combining content extracted from multiple sources: For 
example, combining information about a title from Amazon, 
BarnesAndNoble, on-line library catalogs, etc. 

• Using full-text as content: A digital library should be able 
to efficiently utilize the complete on-line text, as well as ab- 
stracted summaries and reviews, to recommend items. 

CONCLUSIONS 

The ability to recommend books and other information sources 
to users based on their general interests rather than specific 
enquiries will be an important service of digital libraries. 
Unlike collaborative filtering, content-based recommending 
holds the promise of being able to effectively recommend un- 
rated items and to provide quality recommendations to users 
with unique, individual tastes. Libra is an initial content- 
based book recommender which uses a simple Bayesian learn- 
ing algorithm and information about books extracted from 
the web to recommend titles based on training examples sup- 
plied by an individual user. Initial experiments indicate that 
this approach can efficiently provide accurate recommenda- 
tions in the absence of any information about other users. 

In many ways, collaborative and content-based approaches 
provide complementary capabilities. Collaborative methods 
are best at recommending reasonably well-known items to 
users in a communities of similar tastes when sufficient user 
data is available but effective content information is not. Content- 
based methods are best at recommending unpopular items to 
users with unique tastes when sufficient other user data is 
unavailable but effective content information is easy to ob- 
tain. Consequently, as discussed above, methods for integrat- 
ing these approaches will perhaps provide the best of both 
worlds. 

Finally, we believe that methods and ideas developed in ma- 
chine learning research 1 22 1 are particularly useful for content- 
based recommending, filtering, and categorization, as well as 
for integrating with collaborative approaches [||, 0] . Given 
the future potential importance of such services to digital li- 
braries, we look forward to an increasing application of ma- 
chine learning techniques to these challenging problems. 
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